Choosing BLAST options for better detection of orthologs as reciprocal best hits

نویسندگان

  • Gabriel Moreno-Hagelsieb
  • Kristen Latimer
چکیده

MOTIVATION The analyses of the increasing number of genome sequences requires shortcuts for the detection of orthologs, such as Reciprocal Best Hits (RBH), where orthologs are assumed if two genes each in a different genome find each other as the best hit in the other genome. Two BLAST options seem to affect alignment scores the most, and thus the choice of a best hit: the filtering of low information sequence segments and the algorithm used to produce the final alignment. Thus, we decided to test whether such options would help better detect orthologs. RESULTS Using Escherichia coli K12 as an example, we compared the number and quality of orthologs detected as RBH. We tested four different conditions derived from two options: filtering of low-information segments, hard (default) versus soft; and alignment algorithm, default (based on matching words) versus Smith-Waterman. All options resulted in significant differences in the number of orthologs detected, with the highest numbers obtained with the combination of soft filtering with Smith-Waterman alignments. We compared these results with those of Reciprocal Shortest Distances (RSD), supposed to be superior to RBH because it uses an evolutionary measure of distance, rather than BLAST statistics, to rank homologs and thus detect orthologs. RSD barely increased the number of orthologs detected over those found with RBH. Error estimates, based on analyses of conservation of gene order, found small differences in the quality of orthologs detected using RBH. However, RSD showed the highest error rates. Thus, RSD have no advantages over RBH. AVAILABILITY Orthologs detected as Reciprocal Best Hits using soft masking and Smith-Waterman alignments can be downloaded from http://popolvuh.wlu.ca/Orthologs.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reciprocal best hits are not a logically sufficient condition for orthology

It is common to use reciprocal best hits, also known as a boomerang criterion, for determining orthology between sequences. The best hits may be found by blast, or by other more recently developed algorithms. Previous work seems to have assumed that reciprocal best hits is a sufficient but not necessary condition for orthology. In this article, I explain why reciprocal best hits cannot logicall...

متن کامل

Detecting putative orthologs

We developed an algorithm that improves upon the common procedure of taking reciprocal best blast hits(rbh) in the identification of orthologs. The method-reciprocal smallest distance algorithm (rsd)-relies on global sequence alignment and maximum likelihood estimation of evolutionary distances to detect orthologs between two genomes. rsd finds many putative orthologs missed by rbh because it i...

متن کامل

Quickly Finding Orthologs as Reciprocal Best Hits with BLAT, LAST, and UBLAST: How Much Do We Miss?

Reciprocal Best Hits (RBH) are a common proxy for orthology in comparative genomics. Essentially, a RBH is found when the proteins encoded by two genes, each in a different genome, find each other as the best scoring match in the other genome. NCBI's BLAST is the software most usually used for the sequence comparisons necessary to finding RBHs. Since sequence comparison can be time consuming, w...

متن کامل

Ortholog detection using the reciprocal smallest distance algorithm.

All protein coding genes have a phylogenetic history that when understood can lead to deep insights into the diversification or conservation of function, the evolution of developmental complexity, and the molecular basis of disease. One important part to reconstructing the relationships among genes in different organisms is an accurate method to find orthologs as well as an accurate measure of ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Bioinformatics

دوره 24 3  شماره 

صفحات  -

تاریخ انتشار 2008